WORLD INTELLECTUAL PROPERTY CMIOANIZATION 
Interoadontl Buieaa 




PCX 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) Internadonai Patent Classification ^ : 

C12N 15/12, C07K 14/47, C12N 1/21, 
C12Q 1/68 



Al 



(11) International Pubficatioo Niunbcr: 
(43) International PubOcatioo Date: 



WO 97/1)8312 

6 March J997 (06.03.97) 



(21) International Application Number: PCT/U596^13598 

(22) International FiUng Date: 26 August 1996 (26.08.96) 



(30) Priority Date: 
08/518,862 



24 August 1995 (240)8.95) 



US 



(71) Applicant: THE JOHNS HOTIONS UNIVERSITY [US/US]; 

720 Rutland Avenue, Baltimore. MD 21205 (US). 

(72) Invenion: VOGELSTEIN, Bert; 3700 Breton Way, Baltimore, 

MD 21208 (US). KINZLER, Kemietfa. W; 1403 Halkiric 
Way, BelAir. MD 21015 (US). NICOLAIDES, NichoUt, 
C; Apartment 2B, 1 Rosencrans Place, Baltimore. MD 

21236 (US). 

(74) Agents: HOSGHEnT, Dale, H. et al4 Banner & Witcoff; Ust, 
11th door. 1001 G Street N.W., Washington, DC 20001- 
4597 (US). 



(81) Designated SUtes: CA, JP, European patent (AT, BE, CH, DE, 
DK, ES, FL FR, GB, GR, IE, IT. LU MC, NL. PT. SE), 



Published 

With international search report. 

Before the expiraium of the time limit for amending the 
claims and to be rept^Ushed in the event of tht tcc^^jt of 
amendments. 



(54) Title: HUMAN JTVl GENE OVERLAPS PMS2 GENE 



(57) Abstract 

The liPMS2 gene encodes a protein which is involved in DNA mismatch repair and is mutated in a subset of patients with hereditary 
nonpolyposts colon cancer (HNPCC). The prcvlously published hPMS2 cDNA sequence lades an upstieam in-frame stop codon preceding 
the presumptive initiating methionine. To further evaluate die 5* terminus of the hPMS2 coding region, we isolated additional cDNA 
clones, RT-PCR products, and die corresponding 5* genomic segment of die hPMS2 locus. The hPMS2 gene transcripts were found to 
have heterogeneous but coUinear S' termini, one of which contained an in-frame tesmioation codon preceding the initialing mcdiioninc. In 
addition, a gene encoding a 34^ kDa polypeptide was found to transcripdonaiiy initiate within hPMS2 from the opposite strand. 
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pq}dcies from the 85 kDa protein revealed it to be the produa of hMLHl, and this 
protein's moiecuiar weight agreed with that predicted from the cDNA seqtience 
(Bronner ^al., 1994; Papadopouios ecal., 1994). The sequence of the pq^tide 
generated torn the 110 kDa component showed it to be sitnilar to the hPAfSZ 
mutL-homoiog; however, the predicted molecular weight of hPhdSl is only 95 kDa 
(Nicolaides, et^aL, 1994). Since the previously isolated hPhiS2 cDNA clones 
lacked an in-fiame terminaiion codon upstream of the prcsmnpiive inirifltrng 
methionine, it was possible that the open reading ficame extended further upstream. 
Thus there is a need in the an for further knowledge of the genetic structures of 
and adjacent to the known APitfS2 gene. 
SUMMARY OF THE INVENTION 

It is an object of the invention to provide a novel, isolated, human gene on 
chromosome 7. 

It is an object of tiie invention to provide vectors and host cells for making 
a novd human gene product. 

It is another object of die invention to provide compositions of matter 
containing the human gene producL 

These and other objects are provided by one or more of the embodiments 
d e s cr ibed below. In (me embodiment of the invemion, a segment of cDNA is 
provided. The cDNA consists of die sequence of nucleotides shown in Figure 2. 

According to another embodiment of the invention, a vector comprising the 
segment of cDNA which consists of the sequence of nucleotides shown in Figure 
2 is provided,'as wdl as host cells comprising die vector. 

According to still another embodiment of the invendon, a composidon is 
provided. The composition consists essendally of a protein consisting of the amino 
acid sequence shown in Figure 2 

In yet another embodiment of the invention a coriiposidon of protein JTVl 
as shown in Figure 1 is provided. The composidon is free of odier human 
proteuis. 
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In another embodiment of the invention a segment of cDNA is provided 
which segment encodes the amino acid sequence of JTVl protein shown in 
Figure 2. 

cDNA probes are also provided by the present invention. The cDNA 
portion of said probes consists of b^een 15 and 1176 contiguous nucleotides of 
the sequence shown in SEQ ID N0:1. 
PMEF PRSCRgnON OF TRF. DRAWINGS 

Figure 1 shows the sequence of the 5' region of hPMS2 and predicted 
coding regioa. The arrow indicates the 5* end of the previously published cDNA 
clone. The presunqttive initiating methionine is underlined. 

Figure 2 shows die sequence of /rw. The sequence has been deposited 
in Genbank, accession number U24169. The presumptive initiating metiiionine is 
iinrirriined. 

Figure 3 demonstrates the genomic localisation of JTVl. The genomic 

localiraiion of APi£S2 and J7y? were confirmed by Kieening somaiic^eU hybrids 
c ontaining various r^ions of human chromosome 7. Lane 1, GM10791 mmain ^ 
entire duomosome 7 in a Chinese hamster ovary (CHO) background; lane 2, 
NA11440 comains 7pter>7p22 in a CHO background; lane 3, Rtt-Rag4-13 
contains 7cen-7ptBr in a murine background; lane 4, 4AF1/106/K015 contains 
7oeasiter in a murine background; lane 5, GM05184.I7 contains 7q21.2-qtBr in 
a CHO background; lane 6, 2068Rag22-2 contains 7q22-qter in a murine 
background; lane 7, human genomic DNA; lane 8, mouse genomic DNA; lane 9, 
CHO genomic DNA. 

Hgnn 4 demtmsttates the mapping of transcriptional stan sites of hPM52 
and JTVl . Sequence of the genomic region containing the 5' ends of the two 
genes is shown. The sequence is numbered in respect to codon 1 of hPMS2. 
Lower case letters denote inoonic sequence of JTVl (from nr. -479 to -833) and 
hPMS2 (from +24 to +108). Arrows indicate the 5' ends of hPMS2 (sense 
strand) and of 77V/ (antisense strand) cDNA clones. The underlined ATG codons 
indicate the predicted initiating methionines for hPMS2 (at nt + i on the sense 
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Strand) and JTVl (at nt -345 on the antisense strand). The sequence has been 
deposited in GenbanJc accession number U24168. 

Figure 5 shows the expression of hPMS2 and JTVL RNA from various 
tissues was incubated with reverse transcriptase (RT+) or in comroi reactions 
without reverse transcriptase (RT-). The cDNA was used as tempiaie ft^ PGR 
with primers specific for hPMS2 (A) and JTVl (B). RT-PCR products were 
separated by poiyacryiamide gel electrophoresis. 

PErATT.Fn DESCRIPTTOW OF THE PRETORRED E^fBnnT^^^^ 

To invesdgaie the upstream region from hPMS2, we isolated addztionai 
cDNA clones, analyzed the 5* end of hPMS2 transcr^ witfi PCR-based 
techniques, and cloned the corxeqxmding genomic segments. In addition to 
clarifying the transciqn, we serendipitousiy discovered a previously undescxibed 
geneoveiiqjping APA£S2. That gene is termed herein 77T7. The sequences of the 
JTVl cDNA and protein are shown in SEQ ID NOS:l and 2, reqiectivdy. 

A segment of cDNA according to the present invention refers to a 
contiguous stretch of deoxyribonucleotides which have a sequence as obtained upon 
reverse transcriptase of an RNA transciipL Such segments do not contain introns. 
The s^ment may be an isolated molecule or it can be covalently joined to other 
nudeic add sequences. The segmrat may, for exanq>le, be rqificated as part of 
a vector, such as a plasmid, virus, or minichromosome. The vector may be 
rq>licated within a host cell, such as a cell transformed by a recombinant DNA 
molecule* The host cell may be used to produce JTVl protein* It can also be 
used to study reguiadon of expression of JTVl sequences, for example by 
subjecting the host cell to various agents which may or may not affect the 
«prcssion. Although the DNA sequence is discussed with particularity herein, it 
is well within the skill of the an to make small mutations, such as single nucleic 
add substimtions of one of the other three nudeic acid bases, at any of the 
positions of the sequence. In addition, ii is well within the art to make single base 
deletions or single base insemons, to study the effect upon protein structure and 
function. 
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If JTVl is produced in a recombinant host cell which is not human, a 
composition of JTVl protein will be produced which is free of other human 
proteins. If JTVl protein is isolated from naturally producing ceils, or from 
human host cells, then the protein can be purified, for ocamplc, using antibodies 
which are raised against an imraunogen comprising JTVl amino add sequence. 
Any odier means of purification known u. the art can be used, as is desired. 

DNA molecules can be made having diffiaent nucleotide sequences from 
that disclosed in SEQ ID N0;1. but which still encode the JTVl protein as 
disclosed in SEQ ID N0:2. Using the known coding relationsfaips between codoos 
and amino adds and the disclosed amino add sequence, numerous other sequences 
can be readUy designed and produced. Sud» DNA molecules are within the 
contemplation of the subject invention. 

cDNA probes can be used for hybiidizaiion smdies. Typically they are 
labeled with a detectable marioer, such as a ladiolabel or a fluorescent moiety, 
although they need not be. TTie cDNA probes of die subjea invention consist of 
at least 15 contiguous nucleotides of the sequence shown in SEQ ID N0:1. If 
greater specificity is desired, larger molecules of 18, 20, 25, or 30 nucleotides can 
be used, up to a maximum of die entire sequence of 1176 nucleotides. 

JTVl cDNAs can be used as probes to detea deledons in cfaxomosome 7. 
Due to die overiapping promoter regions, large deletions of JTVl would also be 
expected to afBsct PMS2 expression, leading to Hereditary Non-Polyposis 
Colorectal Cancer (HNPCQ. JTVl cDNA can be used in chromosome mapping. 
It can also be used to assay activity or competence of the PMS2 promoter region. 
The presence of JTVl transcripts or JTVl protein suggests that the PMS2 promoter 
IS intact. If the PMS2 promoter is intact and PMS2 products are absent, a 
stnictural defect in the coding region is indicated. 

JTV] sequences can be used to guide homologous recombmauon at the 
PMS2 locus. For example, where a PMS2 mutation is present and therapeutic 
repiaccment with a wild-type gene is desired. PMS2 sequences can be used to 
provide an adiaceni region of homology. Similarly, it may be desirable to target 
other genes to the region adjacent to PAfS2. JTVl sequences can be used to flank 
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such other genes, providing one or more regions of homology. If insertion of 
other genes is desired between the JTVl and the ?MS2 sequences, again, this can 
be accomplished using the identified sequences as homology units for homologous 
recombination. 

Isolation and sequence analv^s of the 5' en d ofhPAfSl. 

Purified DNA from PI clone 53, previously determined to contain the 
hPMS2 gene (Nioolaides, etaL, 1994), was digested with EcoRI and subdoned 
into the pBluescript vector (Stratagene). Clones containing die 5* region ofhPMS2 
were identified by hybridization with pniner A (Table 1) directed to exon 1. 
Restriction analysis of several positive clones showed diem to be identical The 
sequoice of the relevant region of hPMS2 was determined from bodi strands using 
o-dATP and Sequenase (USB). 
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Table L Primers used for hPMS2. 



PRIMER NAME 


STRAND 


PRIMER SEQUENCE 


posmoN* 


A 


sense 


5'- cgggtgttgcatccatgg-3' 


-14 - +4 


B 


sense 


5'-gggtggagcacaacglcg -3* 


-110 - -93 


C 


sense 


5 '-ggtcacgacggagaccg-S* 


-283 - -267 


D 


sense 


5*-tgcaggtgggaagctccacacgg-3' 




-414 - -392 


E 


sense 


5'-tagctcctgccgtgcacg-3' 


-448- -431 


F 


sense 


S'*cgctcctacctgcacgtg*3* 


-487 - -470 




antisense 


5'-tagactcagtaccacctgc-3' 


+90- +107 


H 


sense 


5*-tacagaacctgctaaggcc-3* 


+24 - +42 


I 






+116- +136 1 


J 


sense 


5*-caaccatgagacacatcgc-3' 


+2545- 


K 


antisense 


S'-aggttagtgaagactctgtc-3* 


+2647 - 

+2666 1 



* Relative to the presumptive initiating methionine in Figure L 



Thzeeclones were isolated, each containing an S.SkbEcoRIi Partial 
sequence analysis of one clone, pSMN, determined that it contained coding 
residues of hPMS2 as well as sequences upstream of the previously designated 
codon 1. The presumptive initiating codon reported previously has been 
designated as nucleotide 1 in Figure 1. The sequence of hPMS2 was extended 833 
bp upstream of nucleotide I. Thi5 sequence r-veaied an in-frame stop codon 321 
nts upstream of the published initiator meihionini*, with no intervening methionines 
(Figure 1), 
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Example 2 

Isolation of ari^iitf^nnl ?P^A cinnes iKin p APA/.y2 prnh^c 

Two cDNA libraries were screened with a probe containing nt +24 to 
+ 136 of hPMS2 generated by PGR using P 1 clone 53 as template and the primers 
H and I (Table 1). A human small intestine random-primed cDNA library in 
XGTin (Clontech) and a HeLa oligo^T primed cDNA library in XZAPH 
(Stratagene) were screened as described except hybridizations were carried out at 
68'C and filters were washed at 65»C for 0.5 hrs (Kinzler and Vogelstein, 1989). 
Fallowing plaque purification, the BcoRI inserts fmm the sraaU intestine Hbiary 
were subcloned into pBluesaqn vector, while the HeLa cDNA inserts were 
rescued as phagemids following the manufacturer's protocol (Stiaiageoe). 

One clone was isolated fnmi the random-primed sraaU intestine library , and 
this contained nt -14 to nt +1668 of A?i£S2. Two clones were isolated froii the 
oligo^ primed HeLa cDNA library. Hus clones began at nt -53 and ended at 
either nts +2722 or +2749. ITie HeLa cDNA library was also screcnai with a 
430 bp probe ftom the 5' genomic region of hPMS2, containing nt -414 to + 16, 
geaerated by PGR ficom PI done 53 using primers D (Table 1) and O (Tsibto 2) . 
Hie same two dones were identified, as expected. However, twdve other 
overlapping dones were found and appeared to represent a difierent transctq>t, 
named JTVl (Figure 2). These twelve cDNAs were approximatdy 1.2 kb in 
length and were sequenced in their entirety. AU twdve ended with a polyA tract 
(assumed to be the 3' end) and were identical for 1.2 kb upstream. The5» ends 
were located within 38 bp of each odier. Comparison widi hPMS2 indicated diat 
JTVl was transcribed from the opposite strand. 
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Table 2 Pritni>rs used for 77V-/ cDNA amplification. 



PRIMER NAME 


STRAND 


PRIMER SEQUENCE 


POSITION* 


L 


sense 


5'-gttcigccatgccgatg-3' 


-8- +9 


M 


sense 


5 '-ggcctttggcacgcgctac-3 ' 


•23 . -41 


N 


sense 


5-accggactgcgcmcccg-3 * 


-111 --129 


0 


sense 


5'*tctcagctcgctccatgg-3' 


-343 - -360 


p 


antisense 


S'-gcagagacaggttagactc-3' 


+139- +157 


Q 


sense 


S'-gctccttaagtgaattgccg-3' 


+952 - +971 


R 


andsense 


S'-tgacacttgacaac^gcc-3* 


+1068- 
+1086 



* Relative to the presumptive initiating methionine in Bguie 2. 



Rmmnlf 3 

nvi. 

The length of one clone iqnescntativc of JTV7 ^M23NNFL) was 1233 bp 
and encoded an open reading frame (ORF) of 936 bp (Figure 2). The first 
meduonine within diis ORF was designated codon 1 (Figure 2) and was preceded 
by an in-fitame termination codon 66 bp upstream. This methionine had a 
reasonable match to the Kozak translation initiation consensus (Kozak^ 1986). The 
3' end contained a polyadenylation signal (AAUAAA) starting at nucleotide 1086 
followed by a polyA tail. The transcript wa2 predicted to encode a polypeptide of 
312 amino acids, with a molecular weight of 34.5 kda. Searches of nucleotide and 
peptide sequence databases showed that this was a novel gene, with limited 
homology to the glutathione S-transferase gene family. 
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Chromnffln^^l HffPTh g of JTVl. 

The hPMS2 locus was previously mapped to chromosome 7p22 by HSH 

using PI clone 53 (Nicolaides et.aL, 1994). Because multiple hPHSl-xdaoA 
genes are located on the long arm of chromosome 7 and have conserved 5* regions 
(personal obser.T-ion, Hoii et.al.. 1994), we confirmed the genomic localization 
of 77^7 by PGR analysis of rodent-human somatic cell hybrid DNAs containing 
various regions of chromosome 7 (Scherer etal., 1993; Powers et.aL, 1993). 
PGR primers were chosen from the 3' untranslated region of hPMS2 and 77T7 and 
shown to amplify genomic DNA. hPhdSl primers J and K yielded a 121 bp 
product and 77T7 primers Q and R yielded a 134 bp product PGR products for 
both genes were formed in those DNAs containing the 7p22 region: lines 
GM10791 (containing the entire human chromosome 7), NA11440 (Ctoridl 
Instinite) (7p22>7pter) and Ru-Rag4-13 C7cen-7pter) (figure 3, lanes 1, 2. and 3). 
No products were observed in lines 4AFl/10d/K015 (7cen-<iter), GM05184.17 
(7q21.2-qtex), or 2068Rag22-2 (7q22-qier) (figure 3, lanes 4. 5, and 6). 

Analysis of the V Termini nf anri TTV/ 

The 5* termini of hPMSl transcripts were studied by standard cDNA 
cloning, RAGE, and RT-PCR analyses. RNA was purified from tissues and cells 
using a guanidine isothiocyanate based mediod (Ghomczynski and Sacchi, 1987). 
Reverse tianscriptaseijolymeiase chain reaction (RT-PGR) was petformed using 
randomly primed cDNA as template as described (Leach, etal., 1993). RT-PGR 
of the 5' end of hPMS2 was performed using a common antisense primer (1) and 
the sense primers (A-F) described in Table 1 . RT-PCR mapping of the 5' end of 
rrvi was done using a common antisense primer P and the sense primers L-0 as 
described in Table 2. RACE (rapid ampUfication of cDNA ends. Frohman. et.al., 
1988) was performed on hPMS2 using sequential antisense primers I and G (Table 
1) foUowing die manufiacturer's protocol (Clontech). RACE analysis of JTVl was 
done using the antisense primer P (Table 2). AmpUfication products were cloned 
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into a T-tailed vector (InVitrogen) and sequenced using SP6 and T7 primers. 
Amplifications were done at 95'C for 30 sec, 56*C for 1.5 min,, and 70**C for 
1.5 min for 35 cycles. Reaction products were s^aiated by electrophoresis in 6% 
nondenaturing polyacrylamide gels. 

Hgure 4 shows the sequence of the genomic r^on containing the 
transcripdonal initiation sites of both hPMS2 and J7Vi ,r numbered as in Figure 1 
with respect to hPMS2. The 5' ends of hPMS2 cDNA clones are marked with 
arrowheads on the top strand. One done began at nt -14, one at nt -24, and two 
at nt -53. RACE products were generated from adult brain, leukocyte, and 
placenta mRNA. Using an antisense primer corresponding to nt +116 to +136, 
multiple bands with approximately 160 to 191 bps were observed in addition to 
less intense bands of up to 550 bp. The sequence of four cloned RACE products 
demonstrated that, as expected, their 5' ends were located between nt -25 to -55. 
These data suggested diat the majority of hPMSl transcripts iititxated between nt - 
13 to -55, with a minority extending further upstream. This was confirmed by 
RT-PCR analysis using mRNA from HeLa cells as template* Robust RT-PCR 
products were amplified with sense primers whose 5' ends were at nt -14, -110, 
-2S3, and <414, (primers A, C, and D; Table 1) and an antisense primer 
corresponding to nt +90 to +107 (G). No PCR products were observed using 
sense primers whose 5* ends were at nt -448 or -4S7 (primers E and F). To 
ensure that primers E and F were not defective, successful amplification of 
genomic DNA was performed using these primers and an antisense primer (O) 
corresponding U) nt -2 to +16. 

The 5* termini of J7T7 showed a heterogeneous pattern like that oihPMSZ. 
The 5' ends of the 12 cDNA clones are indicated by arrowheads on the bottom 
strand in figure 4. They were located 73 to 113 nt 73 upstream of codon 1 of 
JTVl, which corresponded to nt -271 to -232 of hPMS2. RACE confirmed the 
cDNA results m thai the majoriiv oi products generated using an antisense primer 
P corresponding to JTVl nt +157 were 230 to 270 bp, RT-PCR analysis was 
performed with antisense primer P and several sense primers (L-O) listed in Table 
2. PCR products were found with sense primers whose 5' ends were at -8, -23, 



wo 97/08312 



- t2 - 



PCr/US96n3598 



and -11 1. (primers L,M, and N) but not with a sense primer O whose 5' end was 
at nt -360 with respect to JTVl, nt +L The latter primer was not defective, as 
a genomic segment could be successfully amplified with it. 

Transcripts of hPMS2 had heterogeneous but coilinear 5' termini, 
containing 11 to 4 IS nt of presumably untranslated sequence. The transcripts 
contained an in-frame stop codon upstream of the presumptive initiating 
methionines (Hgure 1), making the originally described methionine the most likely 
translation initiator. Because no other upstream coding r^ions of hPMSI 
appeared to exist, the size discrq)ancy between that predicted from the hPMS2 
sequence and the 110 kDa hPMS2 protein identified by li and Modrich is likely 
due to post-ttanscriptiofial modifications or alteniative internal exons. 

Our results revealed diat hPAiS2 overlaps widi a novel gene, JTV7, 
transcribed £rom die opposite strand (Figure 4). This organization is similar to 
that of HUMDUGt a mtt£S*4ioniolog found on human chromosome 5, and the 
dihydrofolate reductase (DHFR) gene (Fujii and Shimada, 1989). Bodi hPMS2- 
JTVl and HUMDUG-DHFR lie in a head to head anangement, bodi genes are 
ubiquitously exp re ss ed, and both have multiple 5* termini. It has been 
hypothesized that DHFR and HUMDUG may be regulated via a bidirectional 
promoter, because a minor subset of the transcripts from the two genes overlap. 
The nugor transcripts of HUMDUG and DHFR, however, do not overly, as is 
true for hPMS2 and JTVl, It win be of interest to determine whether other 
mismatch repair genes are arranged in a head to head fiishion with a contiguous 
gene and ifJTVl is involved in DNA r^lication or repair. 

Example 6 

Expression oi hPMSl and JTVL 

The expression of hPMSl and JTVl was analyzed in a variety of mRNA 
samples jjrepared from human tissues. RT-PCR was performed on cDNA 
templates derived from adult brain, leukocytes, kidney, large intestine, colon, 
salivary gland, lung, testes and prostate using primers J and K for hPMS2 ant 
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primers Q and R for JTVl (Tables 1 and 2), Both genes were expressed in all 
tissues tested (Figure 5). 
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SEQUENCE USTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Vogelctein, Bert 

Kinzler W. , Kenneth 
Nicolaides c.r Nicholas 

(11) TITLE OF INVENTION: Human JTVl Gene Overlaps FMS2 Gene 

(111) NUMBER OF SEQXTENCESt S 

(lv> CORBSSPONOSNCE ADDRESS: 

(A) ADDRESSEE) Banner & Allecrretti, LTD. 

(B) STREET: 1001 6 Street, NW 

(C) cm: Washington DC 

(E) COUNTRY: U.S.A. 

(F) ZIP: 20001 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPES Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC«>DOS/HS*DOS 

(D) SOFTHARE: PatentIn Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 
<A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viU) ATTORNEY/AQSNT INFORMATION: 
<A) NAME: BCagan A* , Sarah 

(B) REGISTRATION NUMBER: 32,141 

(C) REFERENCE/DOCKET NUMBER: 1107.49697 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 202-508-9100 

(B) TELEFAX: 202-508-9299 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 384 base pairs 

(B) TYPE: nucleic acid 
(C^'STRANOEONBSS: single 
(0) TOPOLOGY: Linear 

(ii) MOLECULE TYPE: cDMA 

(iiil HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 46., 384 
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(xi) SEQUENCE DESCRIPTION: SEQ ID KO:l: 

TTACCTGCTA CATCGGCA7G GCAGAACCAA AGCAAAAGGG GG7AG CGC GTG CCA S4 

Arg Val Pro 

a 

AAG GCC AAC GCT CAG AAA CCC TCA GAG GTC ACG ACG GAG ACC GGC CAC 102 
Lya Ala Asn Ala Gin Lys Pro ser Glu Val Thr Thr Glu Thr Gly His 
5 10 15 

CTC CCT TCT GAC CCT GCT GCG GGC GTT CGG GAA AAC GCA GTC 06G TGT 150 
Lfttt Pro Ser Asp Pro Ala Ala Qly Val Ar9 Glu Asn Ala Val Arg Cys 
20 25 30 35 

GCT CTG ATT GGC CCA GGC TCT TTG ACQ TCA OGA ACT CGA CCT TTG ACA 198 
Ala X#ett lie Gly Pro Gly Ser Leu Thr Ser Arg Ser Arg Pro I«eu Thr 
40 45 SO 

GAG CCA ATA CGC GAA AAC GAG AGA CGG GAA GTA TTT TTG CCG CCC CGC 246 
Glu Pro Zle Gly Glu Lye Glu Arg Arg Glu Val Phe Z.eu Pro Pro Arg 
55 60 65 

COO GAA AGG GTG GAG CAC AAC GTC GAA AGO AGC CAA TGG GAG TTC AGO 294 
Pro Glu Arg Val Glu His Asa Val Glu Ser Ser Gin Trp Glu Phe Arg 
70 75 80 

AGG CGG AGC GCC TGT GGG AGC CCT GGA GGG AAC TTT CCC AGT CCC OGA 342 
Arg Arg Ser Ala cys Gly Ser Pro Gly Gly Asn Phe Pro Ser Pro Arg 
es 90 95 

GGC GGA TCO GGT GTT GCA TCC ATG GAG CGA GCT GAO AGC TOG 364 
Gly Gly Ser Gly Val Ala Ser Met Glu Arg Ala Glu Ser Ser 
100 lOS 110 



(2) ZNFORMATZOtr FOR SEQ ID N0t2: 

(i) SEQOERCB CHARACTERISTICS s 

(A) UHGTHi 113 amino acids 

(B) TTPE: amino acid 
(D) TOPOLOGT: linear 

(ii) MOLECtTLE TffE: protein 

(xi) SEQUEHCB DESCRIPTION: SEQ ID NO: 2 s 

Arg Val Pro LyB Ala Asn Ala Gin Lys Pro Ser Glu Val Thr Thr Glu 
1 5 10 15 

Thr Gly His Leu Pro Ser Asp Pro Ala Ala Gly Val Arg Glu Asn Ala 
20 25 30 

Val Arg Cys Ala Leu lie. Cly Pro Gly Ser Leu Thr Ser Acq Ser Arg 
35 40 45 

Pro Leu Thr Glu Pro lie G}y Glu Lys Glu Arg Arg Glu Val Phe Leu 
50 55 60 

Pro Pro Arg Pro Glu Arq Val Ciu His Asn Val Glu Ser Ser Gin Trp 
65 70 75 80 



Glu Phe Arq Arg Arg Ser Ala Cya Cly Ser Pro Gly Gly Asn Phe Pr 
85 90 95 
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Ser Pro Arg Gly Gly Ser Gly Val Ala Ser Met Glu Arg Ala Glu Ser 
100 105 110 



Ser 



(2) INFORMATION FOR SEQ 10 NO: 3: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1233 bastt pairs 

(B) TYTE: nucleic acid 

(C) STRANDEONESS: single 

(D) TOPOLOGV: linear 

(ii) MOLECUt.E TYPE: cDNA 



(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 114. .1049 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CCGAACGCCC GCAGCACGGT CAGAAGGGAG GTGGCOGGTC TCOGTCGTGA CCTCTGACGG 60 

TTTCTCACCG TTCCCCTTTG GCAOCCGCTA CACCCTTTTG CTTTGGTTCT GCC ATG 116 

Met 
1 

CCG ATG TAG CAO GTA AAG CCC TAT CAC CGG GGC CGC GCG CCT CTC COT 164 
Pro Met Tyr Gin Val Lye Pro Tyr His Gly Gly Gly Ala Pro Leu Arg 
5 10 15 

GTG GAG CTT CCC ACC TGC ATG TAG CGC CTC CCC AAC GTG CAC GGC AGG 212 
Val Glu Leu Pro Thr Cys Met Tyr Arg Leu Pro Asn Val His Gly Arg 
20 25 30 

AGO TAG GCC CCA CCO COG GGC GCT GGC CAC GTG CAG GAA GAG TCT AAC 260 
Ser Tyr Gly Pro Ala Pro Gly Ala Gly His Val Gin Glu Glu Ser Asn 
35 40 45 

CTC TCT CTG CAA. GCT CTT GAG TCC CGC CAA GAT GAT ATT TTA AAA CGT 308 
Leu Ser Leu Gin Ala Leu Glu Ser Arg Gin Asp Asp He Leu Lys Arg 
50 55 60 65 

CTG TAT GAG TTG AAA GCT GCA GTT GAT GGC CTC TCC AAG ATG ATT CAA 356 
Leu Tyr Glu Leu Lys Ala Ala Val Asp Gly Leu Ser Lys Met He Gin 
70 75 80 

ACA CCA GAT GCA GAC TTG GAT GTA ACC AAC ATA ATC CAA GCG GAT GAG 404 
Thr Pro Asp Ala Asp Leu Asp Val Ttir Asn He He Gin Ala Asp Glu 
85 90 95 

CCC ACG ACT TTA ACC ACC AAT GCG CTG GAC TTG AAT TCA GTG CTT GGG 452 
Pro Thr Thr Leu Thr Thr Asn Ala Leu Asp Leu Asn ser Val Leu Gly 
100 105 110 
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AAG GAT TAG GGG GCG CTG AAA GAG ATC GTG ATC AAC GCA AAC CCG GCC 500 
Lys Asp Tyr Gly Ala Leu Lys Asp lie Vai lie Asn Ala Asn Pro Ala 
115 120 125 

TGC CCT GGC CTC TCC CTG GTT GTG CTG GAG AGG CTG CTC TGT GAG GAG 548 
Ser Pro Pro Leu Set Leu Leu Val Leu His Arg Leu Leu Cys Glu Hia 
130 135 140 145 

TTC AGG GTG CTC TCC ACG GTG GAG ACG GAG TCC TCG CTC AAG AGG GTG 596 
Phe Arg Val Leu Ser Thr Val His Thr His ser Ser Val Lys ser Val 
150 155 160 

CCT GAA AAG CTT CTC AAC TGC TTT GGA GAA CA6 AAT AAA AAA GAG GCC 644 
Pro Glu Asn Leu Leu Lye Cys Phe Gly Glu Gin Asn Lys Lys Cln Pro 
165 170 175 

CGC GAA GAC TAT CAG CTC GGA TTC ACT TTA ATT TGO AAG AAT GTG GCC 692 
Arg Gin Asp Tyr Gin Leu Gly Phe Thr Leu lie Trp Lys Asn Val Pro 
180 185 190 

AAC AGO CAG ATG AAA TTC AGC ATC CAG AGO ATC TGC CGC ATC GAA GGC 740 
Lys Thr Oln Met Lys Phe Ser He Gin Thr Met Cys Pro He Glu Gly 
195 200 205 

GAA GGO AAC ATT GCA OCT TTC TTG TTC TCT CTG TTT GGC CAG AAG CAT 788 
GXtt Gly Asn lie Ala Arg Phe Leu Phe Ser Leu Phe Gly Gin Lys His 
210 215 220 225 

AAT GCT GTC AAC GCA ACC CTT ATA GAT AGC TGG GTA CAT ATT GCG ATT 836 
Asn Ala Val Asn Ala Thr Leu He Asp Ser Trp Val Asp He Ala He 
230 235 240 

TTT CAO TTA AAA GAG GGA AGC ACT AAA GAA AAA GCG GCT GTT TTC OCC 884 
Phe Gin Leu Lys Glu Gly Ser Ser Lys Glu Lys Ala Ala Val Phe Arg 
245 250 255 

TCC ATG AAC TCT GCT CTT GGG AAG AGC GCT TGG CTC GCT GGC AAT GAA 932 
Ser Met Asn Ser Ala Leu Gly Lys Ser Pro Trp Leu Ala Gly Asn Glu 
260 265 270 

CTC ACC GTA GCA GAC GTG GTG CTG TGG TCT GTA CTC CAG CAG ATC GGA 980 
Leu Thr val Jlla Asp Val .Val Leu Trp Ser Val Leu Oln Gin He Gly 
275 280 285 

GGC TGC AGT GTG ACA GTG GCA GCG AAT GTG CAG AGG TGG ATG AGG TCT 1028 
Gly cys ser Val Thr Val Pro Ala Asn Val Gin Arg Trp Met Arg Ser 
290 ^ 295 300 305 

TGT GAA AAC CTG GCT CCT TTT TAAGAGGGCC CTCAAGCTCC TTAAGTGAAT 1079 
Cys Glu Asn Leu Ala Pro Phe 
310 

TGCCGTAACT GATTTTAAAG GGTTTAGATT TTAAGAATGG T^:CTCTTTC;i TGCCTATT^T 1139 

CAGTAAGGGG ACTTGTATTA GAGTCAGAGT CTTTTTATTT AGGCCAGTTG TCAAGTGT':;* 1199 

ATAAAAGCGC ATCATGTAAT TTAAAAAAAA AAAA 1233 



(2) INFORMATION PGR SEQ ID NOt4: 

(1) SCQOENCE CHARACTSRISTICS: 

(A> LENGTH: 312 amino cicicis 
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(B) TYPE: amino acid 
(D) TOPOLCXJY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4s 

Met Pro Met Tyr Gin Val ly- Pro Tyr Hia Cly GXy Gly Ala Pro Leu 
^5 10 15 

Arg Val Glu Leu Pro Thr Cys Met Tyr Arg Leu Pro Aan Val His Gly 
20 25 30 ^ 

Arg ser Tyr Gly Pro Ala Pro Cly Ala Gly Hie Val Gin Glu Glu Ser 
35 40 45 

Aen Leu Ser Leu Gin Ala Leu Glu Ser Arg Gin Asp Asp He Leu Lye 

50 55 60 

Arg Leu Tyr Glu Leu Lys Ala Ala Val Asp Gly Leu Ser Lya Met He 
®5 70 75 80 

cm Thr Pro Asp Ala Asp Leu Asp Val Thr Asn He He Gin Ala Asp 
65 90 95 

Olu Pro Thr Thr Leu Thr Thr Asn Ala Leu Asp Leu Asn Ser Val Leu 
100 105 110 

Cly Lys Asp Tyr Cly Ala Leu Lys Asp He Val He Asn Ala Asn Pro 

"5 120 125 

^* f?f ^« ^eu His Arg Leu Leu Cys Glu 

130 135 145 ^ 

His Phe Arg vaX Leu Ser Thr Val His Thr His Ser Ser Val Lys Ser 

ISO 155 ' 15Q 

Val Pro Glu Asn Leu Leu Lys Cys Phe Gly Glu Gin Asn Lys Lys Gin 
IfiS 170 175 

Pro Arg Gin Asp Tyr Gin Leu Gly Phe Thr Leu He Trp Lys Asn Val 
180 185 190 

Pro Lya Thr Gin Met Lys Phe Ser He Gin Thr Met Cys Pro He Glu 
195 200 205 

Gly Glu Gly Aso He Ala Arg Phe Leu Phe Ser Leu Phe Cly Gin Lys 
210 215 220 

His Asn Aia Val Asn Ala Thr Leu lie Asp Ser Trp Val Asp He Ala 
225 230 235 240 

He Phe 0\n Leu Lys Glu Gly S«r Ser Lys Glu Lys Ala Ala Va^ Phe 
245 250 255 

Arg Ser Met Aan ser Ala Leu Gly Lys Ser Pro Trp Leu Aia Giv Asn 
260 265 270 

Glu Leu Thr Val Ala Asp Val Val Leu Trp Ser Val Leu Gin Gin He 
275 280 285 
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Gly Gly Cys Ser Val Thr Val Pro Ala Asn Val Gin Arg Trp Met Arg 
290 295 300 

Ser Cya Glu Asn Leu Ala Pro Phe 
305 310 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS s 

(A) LENGTH t 900 base paixB 

(B) TYPE: nucleic acid 

(C) STRANDEDHESSs double 
(0) TOPOLOGY: linear 

(ii) MOLECOLE T7PS: DNA (genomic) 

(iii) RTPOTHETICALt NO 

{iv) ANTZ-SENSSt NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Hono aapiens 

(ix) FEATURE I 

(A> KAKE/KETt mRNA 

(B) LOCATION: conplement (1..900) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



ACACCCCOCC 


AATTTCT6TA 


TTTTTAOTAG 


AGA06AG0TT 


TTACCATGTT 


GGCCAGGCTA 


60 


6TCT0QAACT 


CCTQACCTCA 


GGTQATCOOC 


CO6CCTO00C 


CTCCCAAAOT 


GCTGGGATTA 


120 


CAGGCGTGAG 


CCACGGOGCC 


CGOOCTOQAT 


AAATCTTTTA 


AAAGATAAAA 


6TCTGA6TGA 


180 


GTCCCTGGCC 


GGCCGGCACA 


GATGCCGOGG 


TGOGGCOOTO 


AACOGQTTGG 


GACCCGCTC6 


240 


CTCCGGCCTG 


CGGGGACCOG 


GGCCAGCAGC 


COOTCOOCGC 


GO6TCC0CAC 


TGGGCGGGGG 


300 


GCCCOGOGCT 


CCTAOCTGGA 


CGTGGCCAGG 


CCCGOCGCTO 


06CCGTA0CT 


CCTGCOGTGC 


360 


ACGTTGGGGA 


GCOGGTACAT 


GCAG6TGGGA 


AGCTCCACAC 


OGAGAGGCGC 


GCCGCCCCCG 


420 


TGATACCGCT 


TTACCTGGTA 


CATCG6CATG 


GCAGAACCAA 


AGCAAAAGGG 


GGTAG0G06T 


480 


GCCAAAGGCC 


AAOGCTCAGA 


AACCGTCAGA 


GGTCACGACG 


GAGACCGOCC 


ACCTCCCTTC 


540 


TGACCCTGCT 


GCGGGCGTTC 


GGGAAAACGC 


AGTCCGGTGT 


GCTCTGATTG 


GCCCAGGCCC 


600 


TTTGACGTCA 


CGAAGTCGAC 


CTTTGACAGA 


GCCAATAGGC 


GAAAAGGAGA 


GACGGGAAGT 


660 


ATTTTTCCCC 


CCCCGCCCCC 


AAAGGGTCGA 


GCACAACCTC 


GAAAGCAGCC 


AATGGGAGTT 


720 


CAGGACGCGC 


AGCGCCTCTC 


GGAGCCCTCG 


AGGGAACTTT 


CCCAGTCCCC 


GAGGCGGATC 


780 


COCTCTTCCA 


TCCATGGAGC 


GAGCTGACAO 


CTCGAGCTGA 


GCGGGGCTCG 


CAOTCTTCCG 


840 


GTGTCCCCTC 


TCGCCCGCCC 


TCTTTGAGAC 


CCAOCGCATT 


CCAACCTCCC 


TGGAAATGGG 


900 
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1. A segment of cDNA consisting of the nucleotide sequence shown 
in Figure 2. 

2. A vector comprising the segment of DNA of claim i. 

3. A host cell which comprises the vector of claim 2. 

4. A composition consisting essentially of a protein consisting of the 
amino add sequence shown in Figure 2. 

5. A composition of protein JTVl as shown in Figure 1, wherein said 
compositioa is free of other homan proteins. 

6. A s^ent of cDNA wiiich encodes the amino add sequence of 
JTVl proteia shown in Figure 2. 

7. A cDNA probe wherein said cDNA consists of between 15 and 1176 
contiguous nucleotides of the seqtienoe shown in SEQ ID NO:l. 
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